The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC). However, the variation of signal scales across and within time series data, makes it challenging to decide on proper RF sizes for TSC. In this paper, we propose a dynamic sparse network (DSN) with sparse connections for TSC, which can learn to cover various RF without cumbersome hyper-parameters tuning. The kernels in each sparse layer are sparse and can be explored under the constraint regions by dynamic sparse training, which makes it possible to reduce the resource cost. The experimental results show that the proposed DSN model can achieve state-of-art performance on both univariate and multivariate TSC datasets with less than 50\% computational cost compared with recent baseline methods, opening the path towards more accurate resource-aware methods for time series analyses. Our code is publicly available at: https://github.com/QiaoXiao7282/DSN.
translated by 谷歌翻译
自视觉变压器(VIT)出现以来,变形金刚在计算机视觉世界中迅速发光。卷积神经网络(CNN)的主要作用似乎受到越来越有效的基于变压器的模型的挑战。最近,几个先进的卷积模型以当地但大量注意机制的驱动的大型内核进行反击,显示出吸引力的性能和效率。尽管其中一个(即Replknet)令人印象深刻地设法将内核大小扩展到31x31,而性能提高,但随着内核大小的持续增长,性能开始饱和,与Swin Transformer等高级VIT的缩放趋势相比。在本文中,我们探讨了训练大于31x31的极端卷积的可能性,并测试是否可以通过策略性地扩大卷积来消除性能差距。这项研究最终是从稀疏性的角度施加极大核的食谱,该核心可以将内核平滑地扩展到61x61,并且性能更好。我们提出了稀疏的大内核网络(SLAK),这是一种纯CNN架构,配备了51x51个核,可以与最先进的层次变压器和现代探测器架构(如Convnext和Repleknet and Replknet and Replknet and Replknet and Replinext and Replknet and Replinext and Convnext and Replentical conternels cor相同或更好在成像网分类以及典型的下游任务上。我们的代码可在此处提供https://github.com/vita-group/slak。
translated by 谷歌翻译
Egentric Assistant(AQTC)以负担中心为中心的问题驱动的任务完成是一项新颖的任务,可帮助AI助手从教学视频和脚本中学习,并逐步指导用户。在本文中,我们通过以两个阶段函数为中心的方法来处理AQTC,该方法由问题2函数模块组成,以使用相关函数和功能2answer模块将问题扎根,以基于历史步骤来预测操作。我们评估了每个模块中的几种可能的解决方案,并与给定基准相比获得了显着的收益。我们的代码可在\ url {https://github.com/starsholic/loveu-cvpr22-aqtc}上找到。
translated by 谷歌翻译
Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging. With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging as it requires trained operators in close proximity to patients for long period of time. In this work, we investigate the important yet seldom-studied problem of scan target localization, under the setting of lung ultrasound imaging. We propose a purely vision-based, data driven method that incorporates learning-based computer vision techniques. We combine a human pose estimation model with a specially designed regression model to predict the lung ultrasound scan targets, and deploy multiview stereo vision to enhance the consistency of 3D target localization. While related works mostly focus on phantom experiments, we collect data from 30 human subjects for testing. Our method attains an accuracy level of 15.52 (9.47) mm for probe positioning and 4.32 (3.69){\deg} for probe orientation, with a success rate above 80% under an error threshold of 25mm for all scan targets. Moreover, our approach can serve as a general solution to other types of ultrasound modalities. The code for implementation has been released.
translated by 谷歌翻译
Embedding tables are usually huge in click-through rate (CTR) prediction models. To train and deploy the CTR models efficiently and economically, it is necessary to compress their embedding tables at the training stage. To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT). Also, we provide theoretical analysis on its convergence. The results show that stochastic weight quantization has a faster convergence rate and a smaller convergence error than deterministic weight quantization in LPT. Further, to reduce the accuracy degradation, we propose adaptive low-precision training (ALPT) that learns the step size (i.e., the quantization resolution) through gradient descent. Experiments on two real-world datasets confirm our analysis and show that ALPT can significantly improve the prediction accuracy, especially at extremely low bit widths. For the first time in CTR models, we successfully train 8-bit embeddings without sacrificing prediction accuracy. The code of ALPT is publicly available.
translated by 谷歌翻译
Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).
translated by 谷歌翻译
采样约束连续分布的问题经常出现在许多机器/统计学习模型中。许多Monte Carlo Markov链(MCMC)采样方法已适应以处理随机变量的不同类型的约束。在这些方法中,与其他对应物相比,汉密尔顿蒙特卡洛(HMC)和相关方法在计算效率方面具有显着优势。在本文中,我们首先回顾了HMC和一些扩展的抽样方法,然后具体解释了三种受约束的基于HMC的采样方法,反射,重新制定和球形HMC。为了说明,我们应用这些方法来解决三个众所周知的约束采样问题,截断的多元正常分布,贝叶斯正则回归和非参数密度估计。在这篇综述中,我们还将约束的采样与受约束设计空间的实验的统计设计中的另一个类似问题联系起来。
translated by 谷歌翻译
事实证明,对预训练的模型进行迅速基于基于预训练的模型的微调对许多自然语言处理任务有效。但是,尚未对生物医学领域的迅速进行调整。生物医学单词在一般领域通常很少见,但在生物医学环境中无处不在,这在微观调整后即使在下游生物医学应用上都显着恶化了预训练的模型的性能,尤其是在低资源场景中。我们提出了一种简单而有效的方法,可以帮助模型在迅速调整过程中学习稀有的生物医学单词。实验结果表明,我们的方法可以使用少量的香草提示设置,无需任何额外的参数或培训步骤即可提高生物医学自然推理任务6%。
translated by 谷歌翻译
彩票(LTS)能够发现准确而稀疏的子网,可以隔离训练以匹配密集网络的性能。合奏并行,是机器学习中最古老的预期技巧之一,可以通过结合多个独立模型的输出来提高性能。但是,在LTS背景下,合奏的好处将被稀释,因为合奏并没有直接导致更稀疏的子网,而是利用其预测来做出更好的决定。在这项工作中,我们首先观察到,直接平均相邻学习的子网的权重显着提高了LT的性能。在这一观察结果的鼓励下,我们进一步提出了另一种方法,通过简单的插值策略通过迭代幅度修剪来识别的子网执行“合奏”。我们称我们的方法彩票池。与幼稚的合奏相比,每一个子网都不会带来性能,彩票池比原始LTS产生的稀疏子网稀疏得多,而无需任何额外的培训或推理成本。在CIFAR-10/100和Imagenet上的各种现代体系结构中,我们表明我们的方法在分布和分发场景方面都取得了显着的性能。令人印象深刻的是,用VGG-16和RESNET-18进行评估,生产的子网稀疏的子网在CIFAR-100上优于原始LTS,在CIFAR-100-C上高达1.88%,而CIFAR-100-C则高于2.36%。最终的致密网络超过了CIFAR-100的预训练密集模型,在CIFAR-100-C上超过2.22%。
translated by 谷歌翻译
视频识别的标准方法通常在完整的输入视频上运行,由于视频中的时空冗余率广泛,因此效率低下。蒙版视频建模(即视频)的最新进展表明,香草视觉变压器(VIT)仅具有有限的视觉内容来补充时空上下文的能力。受到这一点的启发,我们提出了建议的蒙版动作识别(MAR),该识别(MAR)通过丢弃一定比例的补丁并仅在视频的一部分上操作来减少冗余计算。 MAR包含以下两个必不可少的组件:单元运行掩盖和桥接分类器。具体而言,为了使VIT轻松地感知细节以外的细节,并且会呈现单元格的掩蔽,以保留视频中的时空相关性,从而确保可以在同一空间位置观察到在同一空间位置的贴片,以便轻松地重建。此外,我们注意到,尽管部分观察到的特征可以重建语义上明确的隐形贴片,但它们无法实现准确的分类。为了解决这个问题,提出了一个桥接分类器,以弥合重建的VIT编码功能与专门用于分类的功能之间的语义差距。我们提出的MAR将VIT的计算成本降低了53%,并且广泛的实验表明,MAR始终以明显的边距优于现有的VIT模型。尤其是,我们发现由MAR训练的Vit-Lage胜过由标准培训方案训练的Vit-Bugue,这是通过说服Kinetics-400和某些v2数据集中的利润率,而VIT-LARGE的计算开销仅为14.5%。维特(Vit-Huge)。
translated by 谷歌翻译